Associated manuscript: Assessing the calibration of transition probabilities in a multistate model out of the initial state

Section 1 - Large sample analysis: moderate calibration (non-informative censoring)

The first section of this document contains the plots assessing the moderate calibration in the large development sample analysis for the pseudo-value and MLR-IPCW methods in the non-informative censoring (NIC) scenario. To showcase each methods ability to appropriately assess non-linear patterns of miscalibration, there is a seperate plot for each method, containing the calibration plots for the perfectly calibrated, over predicting and under predicting transition probabilities. These plots are of the same type as Figure 2 from the main manuscript.


Figure S1: Assessment of moderate calibration for the pseudo-value approach in scenario NIC, large sample analysis

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Figure S2: Assessment of moderate calibration for the pseudo-value approach in scenario NIC, large sample analysis

Section 2 - Large sample analysis: moderate calibration (weakly and strongly informative censoring)

The second section of this document contains the plots assessing the moderate calibration in the large development sample analysis for the BLR-IPCW, pseudo-value and MLR-IPCW methods in the weakly and strongly informative censoring scenarios (WIC and SIC). To compare the bias of each method in the presence of informative censoring, there is a seperate plot for each type of predicted transition probability, where all three methods (BLR-IPCW, pseudo-value and MLR-IPCW) are compared. These plots are of the same type as Figures 3 and 4 from the main manuscript.


Figure S3: Assessment of moderate calibration for each method

Scenario = WIC, Over predicting transition probabilities---------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Figure S4: Assessment of moderate calibration for each method

Scenario = WIC, Under predicting transition probabilities---------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Figure S5: Assessment of moderate calibration for each method

Scenario = SIC, Over predicting transition probabilities---------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Figure S6: Assessment of moderate calibration for each method

Scenario = SIC, Under predicting transition probabilities

Section 3 - Large sample analysis: moderate and mean calibration sensitivity analyses

The third section of this document contains the plots assessing how robust BLR-IPCW and MLR-IPCW are to misspecification of the weights. We considered four options:

-BLR-IPCW: weights estimated from the data using perfectly specified model, as was done in the main manuscript.

-BLR: no inverse probability of censoring weights were applied in the calibration models.

-BLR-IPCW-MISS: weights were estimated from the data using a misspecified model that did not adjust for the predictor variables (a Kaplan-Meier estimate of being censored).

-BLR-IPCW-DGM: weights were calculated directly from the data generating mechanism, rather than being estimated from the data.

Note that the above is done for both BLR and MLR.

We expect BLR-IPCW-DGM to be optimal. We think the most important comparison is with BLR-IPCW-MISS, which applies the weighting, but in a sub-optimal manner. The important conclusions from these figures are that in scenarios WIC and SIC, even when the weights are misspecified (BLR-IPCW-MISS and MLR-IPCW-MISS), there is not a huge drop in performance. If one fails to adjust for weights at all ('BLR' or 'MLR' approaches), there is a considerable drop in performance. This is even true for assessing mean calibration, and is most evident in Figure XXXX.


Figure S7: Misspecification of weights, BLR

Scenario = M1C1, Perfectly calibrated transition probabilities---------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Figure S8: Misspecification of weights, BLR

Scenario = M1C1, Over predicting transition probabilities---------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Figure S9: Misspecification of weights, BLR

Scenario = M1C1, Under predicting transition probabilities---------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Figure S10: Misspecification of weights, BLR

Scenario = M1C2, Perfectly calibrated transition probabilities---------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Figure S11: Misspecification of weights, BLR

Scenario = M1C2, Over predicting transition probabilities---------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Figure S12: Misspecification of weights, BLR

Scenario = M1C2, Under predicting transition probabilities---------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Figure S13: Misspecification of weights, BLR

Scenario = M1C3, Perfectly calibrated transition probabilities---------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Figure S14: Misspecification of weights, BLR

Scenario = M1C3, Over predicting transition probabilities---------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Figure S15: Misspecification of weights, BLR

Scenario = M1C3, Under predicting transition probabilities---------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Figure S16: Misspecification of weights, MLR

Scenario = M1C1, Perfectly calibrated transition probabilities---------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Figure S17: Misspecification of weights, MLR

Scenario = M1C1, Over predicting transition probabilities---------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Figure S18: Misspecification of weights, MLR

Scenario = M1C1, Under predicting transition probabilities---------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Figure S19: Misspecification of weights, MLR

Scenario = M1C2, Perfectly calibrated transition probabilities---------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Figure S20: Misspecification of weights, MLR

Scenario = M1C2, Over predicting transition probabilities---------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Figure S21: Misspecification of weights, MLR

Scenario = M1C2, Under predicting transition probabilities---------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Figure S22: Misspecification of weights, MLR

Scenario = M1C3, Perfectly calibrated transition probabilities---------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Figure S23: Misspecification of weights, MLR

Scenario = M1C3, Over predicting transition probabilities---------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Figure S24: Misspecification of weights, MLR

Scenario = M1C3, Under predicting transition probabilities---------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Figure S25: Bias (CI) in estimation of mean calibration, misspecification of weights for BLR.

Scenario = M1C3, Under predicting transition probabilities---------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Figure S26: Bias (CI) in estimation of mean calibration, misspecification of weights for MLR.

Scenario = M1C3, Under predicting transition probabilities

Section 4 - Small sample analysis: mean calibration

This section contains the mean calibration plots (median and 2.5 - 97.5 percentile range) for the small sample analysis when patients were grouped into a smaller number of groups (5 and 10) before estimating mean calibration using AJ. Results are also presented for sample size N = 1500, although results could not be obtained for N = 1500 and 20 groups for calibration, as the groups were too small and the Aalen-Johansen estimator could not be estimated.


Figure S27: Small sample analysis. Median and 2.5 - 97.5 percentile range in bias of each estimator for mean calibration. N = 3000, groups = 10.

Scenario = M1C3, Under predicting transition probabilities---------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Figure S28: Small sample analysis. Median and 2.5 - 97.5 percentile range in bias of each estimator for mean calibration. N = 3000, groups = 5.

Scenario = M1C3, Under predicting transition probabilities---------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Figure S29: Small sample analysis. Median and 2.5 - 97.5 percentile range in bias of each estimator for mean calibration. N = 1500, groups = 10.

Scenario = M1C3, Under predicting transition probabilities---------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Figure S30: Small sample analysis. Median and 2.5 - 97.5 percentile range in bias of each estimator for mean calibration. N = 1500, groups = 5.

Scenario = M1C3, Under predicting transition probabilities

Section 5 - Clinical example

This section contains the moderate calibration plot for the clinical example, when using a development dataset of size N = 100,000. The models (N = 5,000 and N = 100,000) were both validated in the same validation dataset of size N = 100,000. The closer grouping of points in the MLR-IPCW calibration scatter plot is evident for the model with development sample size N = 100,000, indicating a better calibrated model.


Figure S31: Moderate calibration according to each method (development sample size N = 100,000)